-
Notifications
You must be signed in to change notification settings - Fork 7
Benchmark Comparison Workflow #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Why: When writing JSON/CSV, the standard locale set by `set_locale` breaks the JSON/CSV structure because of the presence of commas in large numbers.
f96974e to
71ba004
Compare
Presence of commas in numbers break JSON/CSV output
This script can be used to compare peformance of two commits. It checks out, builds and compares benchmark outputs of both commits. How to use: python3 benchmark_compare.py abc123 def456 python3 benchmark_compare.py master your/feature/branch
This is a workflow which can be triggered by commenting "\benchmark" on PRs (or through manual dispatch). It uses the `benchmark_compare.py` to run and compare benchmarks. The comparison result is posted as a comment on the PR.
71ba004 to
bfe6010
Compare
|
Google Benchmarks ships with a dedicated Python script to compare benchmarks. I recommend making use of that rather than rewriting something like that. Additionally, I think we should run the entire benchmark and not only the CC one. |
I did see that. It outputs just differences, I wanted to see the percentage differences also. I will check if there is a way to reuse it.
Yes, we do run the |
- When trying to benchmark with branch names, presence of '\' can have issues, replace them with '-' - output filename changed to "benchmark_comparison.txt" - Add ability to import the compare_benchmarks function for independent use
|
I have made a couple of changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds CLI and CI support for benchmarking performance regressions, including locale formatting controls in the runtime, a custom CMake target for a specific benchmark filter, and a GitHub Actions workflow to compare benchmark outputs between two commits.
- Introduce
disable_thousands_separatorAPI and call it inbenchmarks/main.cppto ensure no digit grouping in JSON/CSV outputs. - Add
sequant_benchmark_cccustom CMake target for running only thecc_fullbenchmark. - Create a workflow (
benchmark_compare.yml) that triggers on PR comments or manual dispatch to run comparisons viabenchmark_compare.py.
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| benchmarks/main.cpp | Call disable_thousands_separator() right after setting the locale |
| benchmarks/CMakeLists.txt | Add sequant_benchmark_cc custom target to run the cc_full filter |
| SeQuant/core/runtime.hpp | Declare disable_thousands_separator() in the public API |
| SeQuant/core/runtime.cpp | Implement disable_thousands_separator() via locale facet overrides |
| .github/workflows/benchmark_compare.yml | Add workflow to compare benchmark outputs and post results as a comment |
Comments suppressed due to low confidence (3)
.github/workflows/benchmark_compare.yml:131
- The workflow uploads
${base_sha}-${head_sha}-comparison.txtbut readsbenchmark-comparison.txt. Update the filename to match the actual uploaded artifact path or use the dynamic names when reading.
const comparisonFile = `benchmark-comparison.txt`;
benchmarks/CMakeLists.txt:25
- [nitpick] The custom target name
sequant_benchmark_ccis ambiguous—consider aligning naming withsequant_benchmarks_ccor adding a more descriptive suffix to clarify it runs only thecc_fullbenchmark.
add_custom_target(sequant_benchmark_cc
benchmarks/main.cpp:13
- [nitpick] Consider adding a brief inline comment explaining why disabling the thousands separator is needed in this context, e.g., to ensure CSV/JSON parsers receive ungrouped numbers.
disable_thousands_separator();
2107abf to
9667369
Compare
It should be pretty straight forward to compute these from the absolute differences, no? The benefit of making use of the "built-in" comparison would be that this script (almost certainly) will get updated along with any potential file format changes of the benchmark output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
| if metric not in ["cpu_time", "real_time"]: | ||
| raise ValueError("Invalid metric specified. Use 'cpu_time' or 'real_time'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend using an enum for the metric. This will make it obvious which values are valid.
This PR introduces the ability to compare performance on CI for PRs.
There are two parts to the workflow:
benchmark_compare.pywhich can be used to compare performance of two commits using benchmark outputs.How to use:
The script is self-contained and can be run locally. All build-related variables (e.g., CMake flags) are defined inside the script.
Caveats: This isn’t 100% foolproof. GitHub-hosted runners can have noise in benchmark results due to shared hardware. From some testing I did, this happens only rarely. It's not perfect, but it should help us catch major performance regressions if any.
Notes: Ideally, we should run this on a self-hosted runner to get reliable numbers. GitHub does not recommend doing this for public repos because of security concerns.
This is my first time wiring up a somewhat complicated CI workflow—open to suggestions or improvements.